|
|
|
|
|
|
|
Choice of programming language |
|
Fourth generation languages |
|
Good programming practice |
|
Coding standards |
|
Module reuse |
|
Module test case selection |
|
Black-box module-testing techniques |
|
Glass-box module-testing techniques |
|
|
|
|
Code walkthroughs and inspections |
|
Comparison of module-testing techniques |
|
Cleanroom |
|
Potential problems when testing objects |
|
Management aspects of module testing |
|
When to rewrite rather than debug a module |
|
CASE tools for the implementation phase |
|
Air Gourmet Case Study: Black-box test cases |
|
Challenges of the implementation phase |
|
|
|
|
|
Programming-in-the-many |
|
Choice of Programming Language |
|
Language is usually specified in contract |
|
But what if the contract specifies |
|
The product is to be implemented in the “most
suitable” programming language |
|
What language should be chosen? |
|
|
|
|
|
Example |
|
QQQ Corporation has been writing COBOL programs
for over 25 years |
|
Over 200 software staff, all with COBOL
expertise |
|
What is “most suitable” programming language? |
|
Obviously COBOL |
|
|
|
|
|
|
What happens when new language (C++, say) is
introduced |
|
New hires |
|
Retrain existing professionals |
|
Future products in C++ |
|
Maintain existing COBOL products |
|
Two classes of programmers |
|
COBOL maintainers (despised) |
|
C++ developers (paid more) |
|
Need expensive software, and hardware to run it |
|
100s of person-years of expertise with COBOL
wasted |
|
|
|
|
|
Only possible conclusion |
|
COBOL is the “most suitable” programming
language |
|
And yet, the “most suitable” language for the
latest project may be C++ |
|
COBOL is suitable for only DP applications |
|
How to choose a programming language |
|
Cost-benefit analysis |
|
Compute costs, benefits of all relevant
languages |
|
|
|
|
|
Which is the most appropriate object-oriented
language? |
|
C++ is (unfortunately) C-like |
|
Java enforces the object-oriented paradigm |
|
Training in the object-oriented paradigm is
essential before adopting any object-oriented language |
|
What about choosing a fourth generation language
(4GL)? |
|
|
|
|
|
First generation languages |
|
Machine languages |
|
Second generation languages |
|
Assemblers |
|
Third generation languages |
|
High-level languages (COBOL, FORTRAN, C++) |
|
Fourth generation languages (4GLs) |
|
One 3GL statement is equivalent to 5–10
assembler statements |
|
Each 4GL statement intended to be equivalent to
30 or even 50 assembler statements |
|
|
|
|
|
|
It was hoped that 4GLs would |
|
Speed up application-building |
|
Applications easy, quick to change |
|
Reducing maintenance costs |
|
Simplify debugging |
|
Make languages user friendly |
|
Leading to end-user programming |
|
Achievable if 4GL is a user friendly, very
high-level language |
|
|
|
|
|
Example |
|
(“Just in Case You Wanted to Know” box, page
438) |
|
The power of a nonprocedural language, and the
price |
|
|
|
|
|
|
|
The picture is not uniformly rosy |
|
Problems with |
|
Poor management techniques |
|
Poor design methods |
|
James Martin suggests use of |
|
Prototyping |
|
Iterative design |
|
Computerized data management |
|
Computer-aided structuring |
|
Is he right?
Does he (or anyone else) know? |
|
|
|
|
|
Playtex used ADF, obtained an 80 to 1
productivity increase over COBOL |
|
However, Playtex then used COBOL for later
applications |
|
4GL productivity increases of 10 to 1 over COBOL
have been reported |
|
However, there are plenty of reports of bad
experiences |
|
|
|
|
|
Attitudes of 43 Organizations to 4GLs |
|
Use of 4GL reduced users’ frustrations |
|
Quicker response from DP department |
|
4GLs slow and inefficient, on average |
|
Overall, 28 organizations using 4GL for over 3
years felt that the benefits outweighed the costs |
|
|
|
|
|
Market share |
|
No one 4GL dominates the software market |
|
There are literally hundreds of 4GLs |
|
Dozens with sizable user groups |
|
Reason |
|
No one 4GL has all the necessary features |
|
Conclusion |
|
Care has to be taken in selecting the
appropriate 4GL |
|
|
|
|
Large sums for training |
|
Management techniques for 4GL, not for COBOL |
|
Design methods must be appropriate, especially
computer-aided design |
|
Interactive prototyping |
|
4GLs and complex products |
|
|
|
|
|
Dangers of a 4GL |
|
Deceptive simplicity |
|
End-user programming |
|
|
|
|
|
Use of “consistent” and “meaningful” variable
names |
|
“Meaningful” to future maintenance programmer |
|
“Consistent” to aid maintenance pyrogrammer |
|
|
|
|
|
Module contains variables freqAverage,
frequencyMaximum, minFr, frqncyTotl |
|
Maintenance programmer has to know if freq,
frequency, fr, frqncy all refer to the same thing |
|
If so, use identical word, preferably frequency,
perhaps freq or frqncy, not fr |
|
If not, use different word (e.g., rate) for
different quantity |
|
Can use frequencyAverage, frequencyMyaximum,
frequencyMinimum, frequencyTotal |
|
Can also use averageFrequency, maximumFrequency,
minimumFrequency, totalFrequency |
|
All four names must come from the same set |
|
|
|
|
|
Issue of self-documenting code |
|
Exceedingly rare |
|
Key issue: Can module be understood easily and
unambiguously by |
|
SQA team |
|
Maintenance programmers |
|
All others who have to read the code |
|
|
|
|
|
Example |
|
Variable xCoordinateOfPositionOfRobotArm |
|
Abbreviated to xCoord |
|
Entire module deals with the movement of the
robot arm |
|
But does the maintenance programmer know this? |
|
|
|
|
|
|
Mandatory at top of every single module |
|
Minimum information |
|
Module name |
|
Brief description of what the module does |
|
Programmer’s name |
|
Date module was coded |
|
Date module was approved, and by whom |
|
Module parameters |
|
Variable names, in alphabetical order, and uses |
|
Files accessed by this module |
|
Files updated by this module |
|
Module i/o |
|
Error handling capabilities |
|
Name of file of test data (for regression
testing) |
|
List of modifications made, when, approved by
whom |
|
Known faults, if any |
|
|
|
|
|
Suggestion |
|
Comments are essential whenever code is written
in a non-obvious way, or makes use of some subtle aspect of the language |
|
Nonsense! |
|
Recode in a clearer way |
|
We must never promote/excuse poor programming |
|
However, comments can assist maintenance
programmers |
|
Code layout for increased readability |
|
Use indentation |
|
Better, use a pretty-printer |
|
Use blank lines |
|
|
|
|
|
Example |
|
Map consists of two squares. Write code to determine whether a point
on the Earth’s surface lies in map square 1 or map square 2, or is not on
the map |
|
|
|
|
|
|
Solution 1.
Badly formatted |
|
|
|
|
|
Solution 2.
Well-formatted, badly constructed |
|
|
|
|
|
|
Solution 3.
Acceptably nested |
|
|
|
|
|
|
Combination of if-if and if-else-if statements
is usually difficult to read |
|
Simplify: The if-if combination |
|
|
|
if <condition1> |
|
if <condition2> |
|
|
|
is frequently equivalent to the single
condition |
|
|
|
if <condition1> && <condition2> |
|
|
|
Rule of thumb |
|
if statements nested to a depth of greater than
three should be avoided as poor programming practice |
|
|
|
|
|
Can be both a blessing and a curse |
|
Modules of coincidental cohesion arise from
rules like |
|
“Every module will consist of between 35 and 50
executable statements” |
|
Better |
|
“Programmers should consult their managers
before constructing a module with fewer than 35 or more than 50 executable
statements” |
|
|
|
|
|
|
No standard can ever be universally applicable |
|
Standards imposed from above will be ignored |
|
Standard must be checkable by machine |
|
|
|
|
|
|
|
Examples of good programming standards |
|
“Nesting of if statements should not exceed a
depth of 3, except with prior approval from the team leader” |
|
“Modules should consist of between 35 and 50
statements, except with prior approval from the team leader” |
|
“Use of gotos should be avoided. However, with prior approval from the
team leader, a forward goto may be
used for error handling” |
|
|
|
|
|
Aim of standards is to make maintenance easier |
|
If it makes development difficult, then must be
modified |
|
Overly restrictive standards are
counterproductive |
|
Quality of software suffers |
|
|
|
|
After preliminary testing by the programmer, the
module is handed over to the SQA group |
|
|
|
|
The most common form of reuse |
|
|
|
|
Worst way—random testing |
|
Need systematic way to construct test cases |
|
|
|
|
|
Two extremes to testing |
|
1. Test to specifications (also called
black-box, data-driven, functional, or input/output driven testing) |
|
Ignore code.
Use specifications to select test cases |
|
2. Test to code (also called glass-box,
logic-driven, structured, or path-oriented testing) |
|
Ignore specifications. Use code to select test cases |
|
|
|
|
|
Example |
|
Specifications for data processing product
include 5 types of commission and 7 types of discount |
|
35 test cases |
|
Cannot say that commission and discount are
computed in two entirely separate modules—the structure is irrelevant |
|
|
|
|
|
Suppose specs include 20 factors, each taking on
4 values |
|
420 or 1.1 ´ 1012 test
cases |
|
If each takes 30 seconds to run, running all
test cases takes > 1 million years |
|
Combinatorial explosion makes testing to
specifications impossible |
|
|
|
|
|
Each path through module must be executed at
least once |
|
Combinatorial explosion |
|
|
|
|
Flowchart has over 1012 different
paths |
|
|
|
|
Can exercise every path without detecting every
fault |
|
|
|
|
|
Path can be tested only if it is present |
|
Weaker Criteria |
|
Exercise both branches of all conditional
statements |
|
Execute every statement |
|
|
|
|
|
Can exercise every path without detecting every
fault |
|
Path can be tested only if it is present |
|
Weaker Criteria |
|
Exercise both branches of all conditional
statements |
|
Execute every statement |
|
|
|
|
|
Neither testing to specifications nor testing to
code is feasible |
|
The art of testing: |
|
Select a small, manageable set of test cases to |
|
Maximize chances of detecting fault, while |
|
Minimizing chances of wasting test case |
|
Every test case must detect a previously
undetected fault |
|
|
|
|
|
We need a method that will highlight as many
faults as possible |
|
First black-box test cases (testing to
specifications) |
|
Then glass-box methods (testing to code) |
|
|
|
|
|
Equivalence Testing |
|
Example |
|
Specifications for DBMS state that product must
handle any number of records between 1 and 16,383 (214–1) |
|
If system can handle 34 records and 14,870
records, then probably will work fine for 8,252 records |
|
If system works for any one test case in range
(1..16,383), then it will probably work for any other test case in range |
|
Range (1..16,383) constitutes an equivalence
class |
|
Any one member is as good a test case as any
other member of the class |
|
|
|
|
|
Range (1..16,383) defines three different
equivalence classes: |
|
Equivalence Class 1: Fewer than 1 record |
|
Equivalence Class 2: Between 1 and 16,383
records |
|
Equivalence Class 3: More than 16,383 records |
|
|
|
|
|
Select test cases on or just to one side of the
boundary of equivalence classes |
|
This greatly increases the probability of
detecting fault |
|
|
|
|
Test case 1: 0 records Member of
equivalence class 1 (and adjacent to boundary value) |
|
Test case 2: 1 record Boundary value |
|
Test case 3: 2 records Adjacent to
boundary value |
|
Test case 4: 723 records Member of
equivalence class 2 |
|
|
|
|
|
Example: |
|
In 2001, the minimum Social Security (OASDI)
deduction from any one paycheck was $0.00, and the maximum was $4,984.80 |
|
Test cases must include input data which should
result in deductions of exactly $0.00 and exactly $4,984.80 |
|
Also, test data that might result in deductions
of less than $0.00 or more than $4,984.80 |
|
|
|
|
|
Equivalence classes together with boundary value
analysis to test both input specifications and output specifications |
|
Small set of test data with potential of
uncovering large number of faults |
|
|
|
|
|
Structural testing |
|
Statement coverage |
|
Branch coverage |
|
Linear code sequences |
|
All-definition-use path coverage |
|
|
|
|
|
Statement coverage: |
|
Series of test cases to check every statement |
|
CASE tool needed to keep track |
|
Weakness |
|
Branch statements |
|
|
|
|
|
Both statements can be executed without
the fault showing up |
|
|
|
|
|
Series of tests to check all branches (solves
above problem) |
|
Again, a CASE tool is needed |
|
Structural testing: path coverage |
|
|
|
|
|
In a product with a loop, the number of paths is
very large, and can be infinite |
|
We want a weaker condition than all paths but
that shows up more faults than branch coverage |
|
Linear code sequences |
|
Identify the set of points L from which control
flow may jump, plus entry and exit points |
|
Restrict test cases to paths that begin and end
with elements of L |
|
This uncovers many faults without testing every
path |
|
|
|
|
|
|
Each occurrence of variable, zz say, is labeled
either as |
|
The definition of a variable |
|
zz = 1 or read (zz) |
|
or the use
of variable |
|
y = zz + 3
or if (zz < 9) errorB () |
|
Identify all paths from the definition of a
variable to the use of that definition |
|
This can be done by an automatic tool |
|
A test case is set up for each such path |
|
|
|
|
|
Disadvantage: |
|
Upper bound on number of paths is 2d,
where d is the number of branches |
|
In practice |
|
The actual number of paths is proportional to d
in real cases |
|
This is therefore a practical test case
selection technique |
|
|
|
|
|
It may not be possible to test a specific
statement |
|
May have an infeasible path (“dead code”) in the
module |
|
Frequently this is evidence of a fault |
|
|
|
|
|
|
|
Quality assurance approach to glass-box testing |
|
Module m1 is more “complex” than module m2 |
|
Metric of software complexity |
|
Highlights modules mostly likely to have faults |
|
If complexity is unreasonably high, then
redesign, reimplement |
|
Cheaper and faster |
|
|
|
|
|
Simplest measure of complexity |
|
Underlying assumption: |
|
Constant probability p that a line of code
contains a fault |
|
Example |
|
Tester believes line of code has 2% chance of
containing a fault. |
|
If module under test is 100 lines long, then it
is expected to contain 2 faults |
|
Number of faults is indeed related to the size
of the product as a whole |
|
|
|
|
|
Cyclomatic complexity M (McCabe) |
|
Essentially the number of decisions (branches)
in the module |
|
Easy to compute |
|
A surprisingly good measure of faults (but see
later) |
|
Modules with M > 10 have statistically more
errors (Walsh) |
|
|
|
|
|
[Halstead] Used for fault prediction |
|
Basic elements are the number of operators and
operands in the module |
|
Widely challenged |
|
Example |
|
|
|
|
|
|
|
Both Software Science, cyclomatic complexity: |
|
Strong theoretical challenges |
|
Strong experimental challenges |
|
High correlation with LOC |
|
Thus we are measuring LOC, not complexity |
|
Apparent contradiction |
|
LOC is a poor metric for predicting productivity |
|
No contradiction — LOC is used here to predict
fault rates, not productivity |
|
|
|
|
|
Rapid and thorough fault detection |
|
Up to 95% reduction in maintenance costs
[Crossman, 1982] |
|
|
|
|
|
Experiments comparing |
|
Black-box testing |
|
Glass-box testing |
|
Reviews |
|
(Myers, 1978) 59 highly experienced programmers |
|
All three methods equally effective in finding
faults |
|
Code inspections less cost-effective |
|
(Hwang, 1981) |
|
All three methods equally effective |
|
|
|
|
|
Tests of 32 professional programmers, 42
advanced students in two groups (Basili and Selby, 1987) |
|
Professional programmers |
|
Code reading detected more faults |
|
Code reading had a faster fault detection rate |
|
Advanced students, group 1 |
|
No significant difference between the three
methods |
|
Advanced students, group 2 |
|
Code reading and black-box testing were equally
good |
|
Both outperformed glass-box testing |
|
|
|
|
|
Conclusion |
|
Code inspection is at least as successful at
detecting faults as glass-box and black-box testing |
|
|
|
|
|
Different approach to software development |
|
Incorporates |
|
Incremental process model |
|
Formal techniques |
|
Reviews |
|
|
|
|
|
|
|
Case study |
|
1820 lines of FoxBASE (U.S. Naval Underwater
Systems Center, 1992) |
|
18 faults detected by “functional verification” |
|
Informal proofs |
|
19 faults detected in walkthroughs before
compilation |
|
NO compilation errors |
|
NO execution-time failures |
|
|
|
|
|
|
|
Fault counting procedures differ: |
|
Usual paradigms |
|
Count faults after informal testing (once SQA
starts) |
|
Cleanroom |
|
Count faults after inspections (once compilation
starts) |
|
|
|
|
|
Report on 17 Cleanroom products [Linger, 1994] |
|
350,000 line product, team of 70, 18 months |
|
1.0 faults per KLOC |
|
Total of 1 million lines of code |
|
Weighted average: 2.3 faults per KLOC |
|
“[R]emarkable quality achievement” |
|
|
|
|
|
We must inspect classes, objects |
|
We can run test cases on objects |
|
Classical module |
|
About 50 executable statements |
|
Give input arguments, check output arguments |
|
Object |
|
About 30 methods, some with 2, 3 statements |
|
Do not return value to caller—change state |
|
It may not be possible to check
state—information hiding |
|
Method determine balance—need to know accountBalance
before, after |
|
|
|
|
|
Need additional methods to return values of all
state variables |
|
Part of test plan |
|
Conditional compilation |
|
Inherited method may still have to be tested |
|
|
|
|
Java implementation of tree hierarchy |
|
|
|
|
Top half |
|
When displayNodeContents is invoked in BinaryTree,
it uses RootedTree.printRoutine |
|
|
|
|
Bottom half |
|
When displayNodeContents is invoked in method BalancedBinaryTree,
it uses BalancedBinaryTree.printRoutine |
|
|
|
|
|
Bad news |
|
BinaryTree.displayNodeContents must be retested
from scratch when reused in method BalancedBinaryTree |
|
Invokes totally new printRoutine |
|
Worse news |
|
For theoretical reasons, we need to test using
totally different test cases |
|
|
|
|
|
Two testing problems: |
|
Making state variables visible |
|
Minor issue |
|
Retesting before reuse |
|
Arises only when methods interact |
|
We can determine when this retesting is needed
[Harrold, McGregor, and Fitzpatrick, 1992] |
|
Not reasons to abandon the paradigm |
|
|
|
|
|
We need to know when to stop testing |
|
Cost–benefit analysis |
|
Risk analysis |
|
Statistical techniques |
|
|
|
|
|
When a module has too many faults |
|
It is cheaper to redesign, recode |
|
Risk, cost of further faults |
|
|
|
|
|
[Myers, 1979] |
|
47% of the faults in OS/370 were in only 4% of
the modules |
|
[Endres, 1975] |
|
512 faults in 202 modules of DOS/VS (Release 28) |
|
112 of the modules had only one fault |
|
There were modules with 14, 15, 19 and 28
faults, respectively |
|
The latter three were the largest modules in the
product, with over 3000 lines of DOS macro assembler language |
|
The module with 14 faults was relatively small,
and very unstable |
|
A prime candidate for discarding, recoding |
|
|
|
|
|
For every module, management must predetermine
maximum allowed number of faults during testing |
|
If this number is reached |
|
Discard |
|
Redesign |
|
Recode |
|
Maximum number of faults allowed after delivery
is ZERO |
|
|
|
|
Sample black-box test cases |
|
Appendix J contains complete set |
|
|
|
|
Module reuse needs to be built into the product
from the very beginning |
|
Reuse must be a client requirement |
|
Software project management plan must
incorporate reuse |
|